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ERROR  ANALYSIS  OF  THE  5-STEP  LANCZOS  METHOD  IN 

FINITE  PRECISION 

ERIN  CARSON  AND  JAMES  DEMMEL 

Abstract.  The  s-step  Lanczos  method  is  an  attractive  alternative  to  the  classical  Lanczos 
method  as  it  enables  an  0{s)  reduction  in  data  movement  over  a  fixed  number  of  iterations.  This 
can  significantly  improve  performance  on  modern  computers.  In  order  for  s-step  methods  to  be  widely 
adopted,  it  is  important  to  better  understand  their  error  properties.  Although  the  s-step  Lanczos 
method  is  equivalent  to  the  classical  Lanczos  method  in  exact  arithmetic,  empirical  observations 
demonstrate  that  it  can  behave  quite  differently  in  finite  precision.  In  the  s-step  Lanczos  method, 
the  computed  Lanczos  vectors  can  lose  orthogonality  at  a  much  quicker  rate  than  the  classical  method, 
a  property  which  seems  to  worsen  with  increasing  s. 

In  this  paper,  we  present,  for  the  first  time,  a  complete  rounding  error  analysis  of  the  s-step 
Lanczos  method.  Our  methodology  is  analogous  to  Paige’s  rounding  error  analysis  for  the  classical 
Lanczos  method  [IMA  J.  Appl.  Math.,  18(3):341-349,  1976].  Our  analysis  gives  upper  bounds  on  the 
loss  of  normality  of  and  orthogonality  between  the  computed  Lanczos  vectors,  as  well  as  a  recurrence 
for  the  loss  of  orthogonality.  The  derived  bounds  are  very  similar  to  those  of  Paige  for  classical 
Lanczos,  but  with  the  addition  of  an  amplification  term  which  depends  on  the  condition  number 
of  the  Krylov  bases  computed  every  s-steps.  Our  results  confirm  theoretically  what  is  well-known 
empirically:  the  conditioning  of  the  Krylov  bases  plays  a  large  role  in  determining  finite  precision 
behavior. 

Key  words.  Krylov  subspace  methods,  error  analysis,  finite  precision,  roundoff,  Lanczos,  avoid¬ 
ing  communication,  orthogonal  bases 

AMS  subject  classifications.  65G50,  65F10,  65F15,  65N15,  65N12 


1.  Introduction.  Given  an  n-by-n  symmetric  matrix  A  and  a  starting  vector  vq 
with  unit  2-norm,  m  steps  of  the  Lanczos  method  [21]  theoretically  produces  the  or¬ 
thonormal  matrix  Vm  =  [tq,  ■  •  ■ ,  Vm]  and  the  (m-l-l)-by-(m-l-l)  symmetric  tridiagonal 
matrix  such  that 

AVjn  —  “t”  ’  (^■^) 

When  m  =  n  —  1,  the  eigenvalues  T„_i  are  the  eigenvalues  of  A.  In  practice,  the 
eigenvalues  of  T  are  still  good  approximations  to  the  eigenvalues  of  A  when  m  <C  n—l, 
which  makes  the  Lanczos  method  attractive  as  an  iterative  procedure.  Many  Krylov 
subspace  methods  (KSMs),  including  those  for  solving  linear  systems  and  least  squares 
problems,  are  based  on  the  Lanczos  method.  In  turn,  these  various  Lanczos-based 
methods  are  the  core  components  in  numerous  scientific  applications. 

Classical  implementations  of  Krylov  methods,  the  Lanczos  method  included,  re¬ 
quire  one  or  more  sparse  matrix-vector  multiplications  (SpMVs)  and  one  or  more 
inner  product  operations  in  each  iteration.  These  computational  kernels  are  both 
communication-bound  on  modern  computer  architectures.  To  perform  an  SpMV, 
each  processor  must  communicate  entries  of  the  source  vector  it  owns  to  other  pro¬ 
cessors  in  the  parallel  algorithm,  and  in  the  sequential  algorithm  the  matrix  A  must 
be  read  from  slow  memory.  Inner  products  involve  a  global  reduction  in  the  paral¬ 
lel  algorithm,  and  a  number  of  reads  and  writes  to  slow  memory  in  the  sequential 
algorithm  (depending  on  the  size  of  the  vectors  and  the  size  of  the  fast  memory). 

Thus,  many  efforts  have  focused  on  communication-avoiding  Krylov  subspace 
methods  (CA-KSMs),  or  s-step  Krylov  methods,  which  can  perform  s  iterations  with 
0{s)  less  communication  than  classical  KSMs;  see,  e.g.,  [4,  5,  7,  9,  10,  16,  17,  8,  33,  35]. 
In  practice,  this  can  translate  into  significant  speedups  for  many  problems  [24] . 
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Equally  important  to  the  performance  of  each  iteration  is  the  convergence  rate  of 
the  method,  i.e.,  the  total  number  of  iterations  required  until  the  desired  convergence 
criterion  is  met.  Although  theoretically  the  Lanczos  process  described  by  (1.1)  pro¬ 
duces  an  orthogonal  basis  and  a  tridiagonal  matrix  similar  to  A  after  n  steps,  these 
properties  need  not  hold  in  finite  precision.  The  effects  of  roundoff  error  on  the  ideal 
Lanczos  process  were  known  to  Lanczos  when  he  published  his  algorithm  in  1950. 
Since  then,  much  research  has  been  devoted  to  better  understanding  this  behavior, 
and  to  devise  more  robust  and  stable  algorithms. 

Although  s-step  Krylov  methods  are  mathematically  equivalent  to  their  classical 
counterparts  in  exact  arithmetic,  it  perhaps  comes  as  no  surprise  that  their  finite 
precision  behavior  may  differ  significantly,  and  that  the  theories  developed  for  classical 
methods  in  finite  precision  do  not  hold  for  the  s-step  case.  It  has  been  empirically 
observed  that  the  behavior  of  s-step  Krylov  methods  deviates  further  from  that  of 
the  classical  method  as  s  increases,  and  that  the  severity  of  this  deviation  is  heavily 
influenced  by  the  polynomials  used  for  the  s-step  Krylov  bases  (see,  e.g.,  [1,  4,  17,  18]). 

Arguably  the  most  revolutionary  work  in  the  finite  precision  analysis  of  classical 
Lanczos  was  a  series  of  papers  published  by  Paige  [25,  26,  27,  28].  Paige’s  analysis 
succinctly  describes  how  rounding  errors  propagate  through  the  algorithm  to  impede 
orthogonality.  These  results  were  developed  to  give  theorems  which  link  the  loss  of 
orthogonality  to  convergence  of  the  computed  eigenvalues  [28].  No  analogous  theory 
currently  exists  for  the  s-step  Lanczos  method. 

In  this  paper,  we  present,  for  the  hrst  time,  a  complete  rounding  error  analysis 
of  the  s-step  Lanczos  method.  Our  analysis  here  for  s-step  Lanczos  closely  follows 
Paige’s  rounding  error  analysis  for  orthogonality  in  classical  Lanczos  [27]. 

We  present  upper  bounds  on  the  normality  of  and  orthogonality  between  the 
computed  Lanczos  vectors,  as  well  as  a  recurrence  for  the  loss  of  orthogonality.  The 
derived  bounds  are  very  similar  to  those  of  Paige  for  classical  Lanczos,  but  with 
the  addition  of  an  amplification  term  which  depends  on  the  condition  number  of 
the  Krylov  bases  computed  every  s  steps.  Our  results  confirm  theoretically  what 
is  well-known  empirically:  the  conditioning  of  the  Krylov  bases  plays  a  large  role  in 
determining  finite  precision  behavior.  In  particular,  if  one  can  guarantee  that  the  basis 
condition  number  is  not  too  large  throughout  the  iteration,  the  loss  of  orthogonality 
in  the  s-step  Lanczos  method  should  not  be  too  much  worse  than  in  classical  Lanczos. 
As  Paige’s  subsequent  groundbreaking  convergence  analysis  [28]  was  based  largely  on 
the  results  in  [27],  our  analysis  here  similarly  serves  as  a  stepping  stone  to  further 
understanding  of  the  s-step  Lanczos  method. 

The  remainder  of  this  paper  is  outlined  as  follows.  In  Section  2,  we  present 
related  work  in  s-step  Krylov  methods  and  the  analysis  of  finite  precision  Lanczos.  In 
Section  3,  we  review  a  variant  of  the  Lanczos  method  and  derive  the  corresponding 
s-step  Lanczos  method,  as  well  as  provide  a  numerical  example  that  will  help  motivate 
our  analysis.  In  Section  4,  we  Hrst  state  our  main  result  in  Theorem  4.2  and  comment 
on  its  interpretation;  the  rest  of  the  section  is  devoted  to  its  proof.  In  Section  5, 
we  recall  our  numerical  example  from  Section  3  in  order  to  demonstrate  the  bounds 
proved  in  Section  4.  Section  6  concludes  with  a  discussion  of  future  work. 

2.  Related  work.  We  briefly  review  related  work  in  s-step  Krylov  methods  as 
well  as  work  related  to  the  analysis  of  classical  Krylov  methods  in  finite  precision. 

2.1.  s-step  Krylov  subspace  methods.  The  term  ‘s-step  Krylov  method’, 
first  used  by  Chronopoulos  and  Gear  [6] ,  describes  variants  of  Krylov  methods  where 
the  iteration  loop  is  split  into  blocks  of  s  iterations.  Since  the  Krylov  subspaces 
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required  to  perform  s  iterations  of  updates  are  known,  bases  for  these  subspaces  can 
be  computed  upfront,  inner  products  between  basis  vectors  can  be  computed  with  one 
block  inner  product,  and  then  s  iterations  are  performed  by  updating  the  coordinates 
in  the  generated  Krylov  bases  (see  Section  3  for  details).  Many  formulations  and 
variations  have  been  derived  over  the  past  few  decades  with  various  motivations, 
namely  increasing  parallelism  (e.g.,  [6,  35,  36])  and  avoiding  data  movement,  both 
between  levels  of  the  memory  hierarchy  in  sequential  methods  and  between  processors 
in  parallel  methods.  A  thorough  treatment  of  related  work  can  be  found  in  [17]. 

Many  empirical  studies  of  s-step  Krylov  methods  found  that  convergence  often 
deteriorated  using  s  >  5  due  to  the  inherent  instability  of  the  monomial  basis.  This 
motivated  research  into  the  use  of  better-conditioned  bases  (e.g.,  Newton  or  Cheby- 
shev  polynomials)  for  the  Krylov  subspace,  which  allowed  convergence  for  higher  s 
values  (see,  e.g.,  [1,  16,  18,  31]).  Hoemmen  has  used  a  novel  matrix  equilibration  and 
balancing  approach  to  achieve  similar  effects  [17]. 

The  term  ‘communication-avoiding  Krylov  methods’  refers  to  s-step  Krylov  meth¬ 
ods  and  implementations  which  aim  to  improve  performance  by  asymptotically  de¬ 
creasing  communication  costs,  possibly  both  in  computing  inner  products  and  comput¬ 
ing  the  s-step  bases,  for  both  sequential  and  parallel  algorithms;  see  [9,  17].  Hoemmen 
et  al.  [17,  24]  derived  communication-avoiding  variants  of  Lanczos,  Arnoldi,  Conjugate 
Gradient  (CG)  and  the  Generalized  Minimum  Residual  Method  (GMRES).  Details  of 
nonsymmetric  Lanczos-based  CA-KSMs,  including  communication-avoiding  versions 
of  Biconjugate  Gradient  (BIGG)  and  Stabilized  Biconjugate  Gradient  (BICGSTAB) 
can  be  found  in  [4].  Although  potential  performance  improvement  is  our  primary 
motivation  for  studying  these  methods,  we  use  the  general  term  ‘s-step  methods’  here 
as  our  error  analysis  is  independent  of  performance. 

Many  efforts  have  been  devoted  specifically  to  the  s-step  Lanczos  method.  The 
first  s-step  Lanczos  methods  known  in  the  literature  are  due  to  Kim  and  Chronopou- 
los,  who  derived  a  three-term  symmetric  s-step  Lanczos  method  [19]  as  well  as  a 
three-term  nonsymmetric  s-step  Lanczos  method  [20] .  Hoemmen  derived  a  three-term 
communication-avoiding  Lanczos  method,  CA-Lanczos  [17].  Although  the  three-term 
variants  require  less  memory,  their  numerical  accuracy  can  be  worse  than  implemen¬ 
tations  which  use  two  coupled  two-term  recurrences  [15].  A  two-term  communication¬ 
avoiding  nonsymmetric  Lanczos  method  (called  GA-BIOG,  based  on  the  ‘BIOG’  ver¬ 
sion  of  nonsymmetric  Lanczos  of  Gutknecht  [14])  can  be  found  in  [2].  This  work 
includes  the  derivation  of  a  new  version  of  the  s-step  Lanczos  method,  equivalent  in 
exact  arithmetic  to  the  variant  used  by  Paige  [27].  It  uses  a  two-term  recurrence  like 
BIOG,  but  is  restricted  to  the  symmetric  case  and  uses  a  different  starting  vector. 

For  s-step  KSMs  that  solve  linear  systems,  increased  roundoff  error  in  finite  pre¬ 
cision  can  decrease  the  maximum  attainable  accuracy  of  the  solution,  resulting  in  a 
less  accurate  solution  than  found  by  the  classical  method.  A  quantitative  analysis  of 
roundoff  error  in  GA-GG  and  GA-BIGG  can  be  found  in  [3] .  Based  on  the  work  of  [34] 
for  conventional  KSMs,  we  have  also  explored  implicit  residual  replacement  strategies 
for  GA-GG  and  GA-BIGG  as  a  method  to  limit  the  deviation  of  true  and  computed 
residuals  when  high  accuracy  is  required  (see  [3]). 

2.2.  Error  analysis  of  the  Lanczos  method.  Lanczos  and  others  recognized 
early  on  that  rounding  errors  could  cause  the  Lanczos  method  to  deviate  from  its 
ideal  theoretical  behavior.  Since  then,  various  efforts  have  been  devoted  to  analyzing, 
and  explaining,  and  improving  the  finite  precision  Lanczos  method. 

Widely  considered  to  be  the  most  significant  development  was  the  series  of  pa- 
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pers  by  Paige  discussed  in  Section  1.  Another  important  development  was  due  to 
Greenbaum  and  Strakos,  who  performed  a  backward-like  error  analysis  which  showed 
that  finite  precision  Lanczos  and  CG  behave  very  similarly  to  the  exact  algorithms 
applied  to  any  of  a  certain  class  of  larger  matrices  [12].  Paige  has  recently  shown 
a  similar  type  of  augmented  stability  for  the  Lanczos  process  [29].  There  are  many 
other  analyses  of  the  behavior  of  various  KSMs  in  finite  precision,  including  some 
more  recent  results  due  to  Wulling  [37]  and  Zemke  [38];  for  a  thorough  overview  of 
the  literature,  see  [22,  23]. 

A  number  of  strategies  for  maintaining  the  orthogonality  among  the  Lanczos 
vectors  were  inspired  by  the  analysis  of  Paige,  such  as  selective  reorthogonalization  [30] 
and  partial  reorthogonalization  [32].  Recently,  Gustafsson  et  al.  have  extended  such 
reorthogonalization  strategies  for  classical  Lanczos  to  the  s-step  case  [13]. 

3.  The  s-step  Lanczos  method.  The  classical  Lanczos  method  is  shown  in 
Algorithm  1.  We  use  the  same  variant  of  Lanczos  as  used  by  Paige  in  his  error 
analysis  for  classical  Lanczos  [27]  to  allow  easy  comparison  of  results.  This  is  the 
first  instance  of  an  s-step  version  of  this  particular  Lanczos  variant;  other  existing 
s-step  Lanczos  variants  are  described  in  Section  2.1.  Note  that  as  in  [27]  our  analysis 
will  assume  no  breakdown  occurs  and  thus  breakdown  conditions  are  not  discussed 
here.  We  now  give  a  derivation  of  s-step  Lanczos,  obtained  from  classical  Lanczos  in 
Algorithm  1. 


Algorithm  1  Lanczos 

Require:  n-by-n  real  symmetric  matrix  A  and  length-n  starting  vector  vq  such  that 

Ikolb  =  1 

1;  Uo  =  Avo 

2:  for  m  =  0, 1, . . .  until  convergence  do 
3: 

4:  —  '^m 

5*  /?m+l  —  ||'^^m||2 

6:  —  '^mj  /^m+l 

7:  UYn-\-l  — 

8:  end  for 


Suppose  we  are  beginning  iteration  m  =  sk  where  k  €  N  and  0  <  5  €  N.  By 
induction  on  lines  6  and  7  of  Algorithm  1,  we  can  write 

'^sk-\-j  •)  '^sk-\-j  € /Cg+i  (A,  Ugfc) -l- (A,  (3-1) 

for  j  G  {0, . . . ,  s},  where  /Ci(A,  x)  =  spanja:.  Ax, . . . ,  A*“^a;}  denotes  the  Krylov  sub¬ 
space  of  dimension  i  of  matrix  A  with  respect  to  vector  x.  Note  that  since  uq  =  Avq, 
if  A:  =  0  we  have 

Vj,Uj  e  ICs+2iA,Vo). 

for  j  e  {0,...,s}. 

For  A:  >  0,  we  then  define  ‘basis  matrix’  =  [Vk,l^k],  where  Vk  and  Uk  are  size 
n-by-(s  -I-  1)  matrices  whose  columns  form  bases  for  ICs+i{A,Vsk)  and  ICs+i{A,Usk), 
respectively.  For  k  =  0,  we  define  A’o  to  be  a  size  n-by-(s  -I-  2)  matrix  whose  columns 
form  a  basis  for  /Cs+2(A, uq).  Then  by  (3.1),  we  can  represent  Vsk+j  and  Usk+j,  for 
j  €  {0, ,  s},  by  their  coordinates  (denoted  with  primes)  in  i-e., 

'Csfc-t-j  —  klsk+j  —  y^k’^kjj' 


(3.2) 
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Note  that  for  fc  =  0,  the  coordinate  vectors  are  length  s  +  2  and  for  fc  >  0,  the 
coordinate  vectors  are  length  2s  +  2.  We  can  write  a  similar  equation  for  auxiliary 
vector  Wsk+j,  i-e.,  Wsk+j  =  ykw'k  j  for  J  €  {0, . . . ,  s  —  1}.  We  define  also  the  Gram 
matrix  Gk  =  y^yk,  which  is  size  (s  +  2)-by-(s  +  2)  for  A:  =  0  and  (2s  +  2)-by-(2s  +  2) 
for  fc  >  0.  Using  this  matrix,  the  inner  products  in  lines  3  and  5  can  be  written 

a,k+j  =  vJk+jU,k+j  =  v'kjykyku'k^j  =  v'kjGku'k^^  and  (3.3) 

Psk+j+i  =  (w^k+jWsk+jf'^  =  (w’ljylykw’k^jf^'^  =  (w'^jGkw'kjfG  ^  (3,4) 

We  assume  that  the  bases  are  generated  via  polynomial  recurrences  represented  by 
the  matrix  Bk^  which  is  in  general  upper  Hessenberg  but  often  tridiagonal  in  practice. 
The  recurrence  can  thus  be  written  in  matrix  form  as 

Ay^^  =  yuBk 

where  Bk  is  size  (s  +  2)-by-(s  +  2)  for  A:  =  0  and  size  (2s  +  2)-by-(2s  +  2)  for  fc  >  0, 
and  =  [Vfc[/s,0s,i]^,0„,i,ri'fc[/s,0s,i]'^,0„4].  Therefore,  for  j  €  {0, . . . ,  s  -  1}, 

Avsk+j+i  =  AykVkj^i  =  =  ykBkVkj+i-  (3-5) 

Thus,  to  compute  iterations  sfc  +  1  through  sfc  +  s  in  s-step  Lanczos,  we  first 
generate  basis  matrix  yk  such  that  (3.5)  holds,  and  we  compute  the  Gram  matrix 
Gk  from  the  resulting  basis  matrix.  Then  updates  to  the  length-n  vectors  can  be 
performed  by  updating  instead  the  length- (2s  -I-  2)  coordinates  for  those  vectors  in 
yk-  Inner  products  and  multiplications  with  A  become  smaller  operations  which 
can  be  performed  locally,  as  in  (3.3),  (3.4),  and  (3.5).  The  complete  s-step  Lanczos 
algorithm  is  presented  in  Algorithm  2.  Note  that  in  Algorithm  2  we  show  the  length- 
n  vector  updates  in  each  inner  iteration  (lines  16  and  18)  for  clarity,  although  these 
vectors  play  no  part  in  the  inner  loop  iteration  updates.  In  practice,  the  basis  change 
operation  (3.2)  can  be  performed  on  a  block  of  coordinate  vectors  at  the  end  of  each 
outer  loop  to  recover  Vsk+i  and  Usk+i,  for  i  €  {1, . . . ,  s}. 

3.1.  A  numerical  example.  We  give  a  brief  example  to  demonstrate  the  behav¬ 
ior  of  s-step  Lanczos  in  finite  precision  and  to  motivate  our  theoretical  analysis.  We 
run  s-step  Lanczos  (Algorithm  2)  on  a  2D  Poisson  matrix  with  n  =  256,  ||  A||2  =  7.93, 
using  a  random  starting  vector.  The  same  starting  vector  is  used  in  all  tests,  which 
were  run  using  double  precision.  Results  for  classical  Lanczos  run  on  the  same  prob¬ 
lem  are  shown  in  Figure  3.1  for  comparison.  In  Figure  3.2,  we  show  s-step  Lanczos 
results  for  s  =  2  (left),  s  =  4  (middle),  and  s  =  8  (right),  using  monomial  (top),  New¬ 
ton  (middle),  and  Ghebyshev  (top)  polynomials  for  computing  the  bases  in  line  3. 
The  plots  show  the  number  of  eigenvalue  estimates  (Ritz  values)  that  have  converged, 
within  some  relative  tolerance,  to  a  true  eigenvalue  over  the  iterations.  Note  that  we 
do  not  count  duplicates,  i.e.,  multiple  Ritz  values  that  have  converged  to  the  same 
eigenvalue  of  A.  The  solid  black  line  y  =  x  represents  the  upper  bound. 

From  Figure  3.2  we  see  that  for  s  =  2,  s-step  Lanczos  with  the  monomial,  Newton, 
and  Ghebyshev  bases  all  well-replicate  the  convergence  behavior  of  classical  Lanczos; 
for  the  Ghebyshev  basis  the  plots  look  almost  identical.  However,  as  s  increases, 
we  see  that  both  convergence  rate  and  accuracy  to  which  we  can  find  approximate 
eigenvalues  within  n  iterations  decreases  for  all  bases.  This  is  clearly  the  most  drastic 
for  the  monomial  basis  case;  e.g.,  for  the  Ghebyshev  and  Newton  bases  with  s  =  8, 
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Algorithm  2  s-step  Lanczos 

Require:  n-hy-n  real  symmetric  matrix  A  and  length-n  starting  vector  vq  such  that 
Ikolh  =  1 
1;  Uo  =  Avq 

2:  for  fc  =  0, 1, . . .  until  convergence  do 
3:  Compute  yk  with  change  of  basis  matrix  Bk 

4:  Compute  Gk  =  yl yk 

5:  4,0  =  ei 

6:  if  A:  =  0  then 

7;  m'o^o  =  ^kGl 

8:  else 

9:  0  =  es+2 

10:  end  if 

11;  for  j  =  0, 1, . . . ,  s  —  1  do 
12;  OLsk+j  =  v'^jGku'i^j 

13: 

14:  /3sk+j+l  = 

13:  4,J  +  1  =  '^'k,jl  Psk+i  +  l 

16:  'Usfc-i-j-i-i  = 

ll"-  Hfc.J+l  ~  +  l  ~  Psk+j+l^i^  j 

18:  Hsfc+J  +  l  —  yk'^ky  +  1 

19:  end  for 

20:  end  for 


Fig.  3.1.  Number  of  converged  Ritz  values  versus  iteration  number  for  classical  Lanczos. 


we  can  at  least  still  find  eigenvalues  to  within  relative  accuracy  ^/e  at  the  same  rate 
as  the  classical  case. 

It  is  clear  that  the  choice  of  basis  used  to  generate  Krylov  subspaces  affects  the 
behavior  of  the  method  in  finite  precision.  Although  this  is  well-studied  empirically 
in  the  literature,  many  theoretical  questions  remain  open  about  exactly  how,  where, 
and  to  what  extent  the  properties  of  the  bases  affect  the  method’s  behavior.  Our 
analysis  is  a  significant  step  toward  addressing  these  questions. 

4.  The  s-step  Lanczos  method  in  finite  precision.  Throughout  our  analy¬ 
sis,  we  use  a  standard  model  of  floating  point  arithmetic  where  we  assume  the  compu¬ 
tations  are  carried  out  on  a  machine  with  relative  precision  e  (see  [11]).  Throughout 
the  analysis  we  ignore  e  terms  of  order  >  1,  which  have  negligible  effect  on  our  results. 
We  also  ignore  underflow  and  overflow.  Following  Paige  [27],  we  use  the  e  symbol  to 
represent  the  relative  precision  as  well  as  terms  whose  absolute  values  are  bounded 
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Fig.  3.2.  Number  of  converged  Ritz  values  versus  iteration  number  for  s-step  Lanczos  using 
monomial  (top),  Newton  (middle),  and  Chebyshev  (bottom)  bases  for  s  =  2  (left),  s  =  4  (middle), 
and  s  =  8  (right). 


by  the  relative  precision. 

We  will  model  floating  point  computation  using  the  following  standard  conven¬ 
tions  (see,  e.g.,  [11]):  for  vectors  u,  u  S  K",  matrices  A  G  and  G  €  and 

scalar  a, 


fl{u  —  av)  =u  —  av  —  Sw, 

<  (  u  -1-  2  an  )e. 

fl{v'^u)  =(v  +  Sv)^u, 

|i5n|  <  ne\v\, 

fl{Au)  ={A  -\-  SA)u, 

<  me\A\, 

fl{A^A)  =A'^A  +  5E, 

\5E\  <  ne  A  ,  and 

fl{u^{Gv))  ={u  +  6uf{G  +  6G)v, 

|i5u|  <  ne\u\,  [dG]  <  ne\G\ 

where  flQ  represents  the  evaluation  of  the  given  expression  in  floating  point  arithmetic 
and  terms  with  S  denote  error  terms.  We  decorate  quantities  computed  in  finite 
precision  arithmetic  with  hats,  e.g.,  if  we  are  to  compute  the  expression  a  =  v^u  in 
finite  precision,  we  get  a  =  fl{u^u). 

We  first  prove  the  following  lemma,  which  will  be  useful  in  our  analysis. 

Lemma  4.1.  Assume  we  have  rank-r  matrix  Y  G  where  n  >  r.  Let 

denote  the  pseudoinverse  ofY ,  i.e.,  Y'^  =  (Y^Y)~^Y'^ .  Then  for  any  vector  x  G  K’', 
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we  can  bound 

lliri  N||2<|||r|||2||a:||2<r||yx|b. 

where  T  =  ||i^+||2  ||  II2  <  V^||^’''||2  II^L' 

Proof.  We  have 

II  irila^l  II2  <  II  |r|  Ibllxlb  <  II  |y|  ||2r+yx||2  <  II  |r|  blir+ihUra^lb  <  r||ya:||2. 

□ 

We  note  that  the  term  T  can  be  thought  of  as  a  type  of  condition  number  for 
the  matrix  Y.  In  the  analysis,  we  will  apply  the  above  lemma  to  the  computed 
‘basis  matrix’  y^..  We  assume  throughout  that  the  generated  bases  and  Vk  are 
numerically  full  rank.  That  is,  all  singular  values  of  lAk  and  Vk  are  greater  than 
en  ■  2L*°S2  0’iJ  where  ai  is  the  largest  singular  value  of  A.  The  results  of  this  section 
are  summarized  in  the  following  theorem: 

Theorem  4.2.  Assume  that  Algorithm  2  is  implemented  in  floating  point  with 
relative  precision  e  and  applied  for  sk+j  steps  to  the  n-by-n  real  symmetric  matrix  A, 
starting  with  vector  vq  with  \\vo\\2  =  Tet  cr  =  ||  |A|||2/||2l||2  =  ||  |,Bfc| ||2/||4l||2, 

where  Bk  is  defined  in  (3.5),  and  let 

fk=  max  ||j)+||2||  iJ^il  II2  >  1  and  fk=  max  n. 

Then  ask^j,  o,nd  Vsk-\-j-\-i  'will  be  computed  such  that 

-^^sk-\-j  —  ^sk-\-jTsk-\-j  ^sk-\-j-\-l'bsk-\-j-\-l^sk-\-j-\-l  bVsk-\-j-) 

with 

^sk-\-j  —  ["^0 1  'bsk-\-j] 
dVsk-\-j  —  bvij  ■  .  ■  ,  SVsk-\-j'\ 

Tsk-\-j 

and 

||di)sfc+j||2  <  e((n+2s+5)cr  +  (4s+9)ffc  +  (10s+16))ffc||2l||2,  (4.1) 

Psk+j+ilvJk+jnsk+j+il  <  2e(n+lls+15)||v4||2ffc,  (4.2) 

IvJk+j+iVsk+j+i  -  1|  <  e(n+8s+12)f^,  and  (4.3) 

Psk+j  +  l  +  “sfc+j  +  Psk+j  ~  ll^^sfc+illi  ^ 

4e(sA:+j +2)  ((n+2s+5)(T  +  (4s+9)Tfc  +  (3n+40s+58))f  ^||^||2.  (4-4) 

Furthermore,  if  Rsk+j  is  the  strictly  upper  triangular  matrix  such  that 
^sk+Tsk+j  =  Rsk+j  4"  diagiy^k+j^sk+j)  +  Rsk+j) 
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then 


^sk+jh^sk+j  ^sk+j^sk+j  Psk+j  +  l^sk+j’^sk+j  +  l^sk+j+l~^^sk+ji  (^■^) 

where  Hgk+j  is  upper  triangular  with  elements  p  such  that 

|»7i,il  <2e(n+lls+15)|jA||2f|,  and  for  i  G  {2, . . . ,  sk+j+1}, 

\ViA  <4e(n+lls+15)||A||2f^, 

<2e((n+2s+5)(j+(4s+9)ffc  +  n+18s+28)f^|| AII2,  and 
\pi,i\  <2e((n+2s+5)(j+(4s+9)ffc  +  (10s+16))ffc||A||2,  for  (.&{!,..  .,i-2}. 

Remarks.  This  generalizes  Paige  [27]  as  follows.  The  bounds  in  Theorem  4.2  give 
insight  into  how  orthogonality  is  lost  in  the  hnite  precision  s-step  Lanczos  algorithm. 
Equation  (4.1)  bounds  the  error  in  the  columns  of  the  resulting  perturbed  Lanczos 
recurrence.  How  far  the  Lanczos  vectors  can  deviate  from  unit  2-norm  is  given  in  (4.3), 
and  (4.2)  bounds  how  far  adjacent  vectors  are  from  being  orthogonal.  The  bound 
in  (4.4)  describes  how  close  the  columns  of  AVsk+j  and  Tsk+j  are  in  size.  Finally,  (4.5) 
can  be  thought  of  as  a  recurrence  for  the  loss  of  orthogonality  between  Lanczos  vectors, 
and  shows  how  errors  propagate  through  the  iterations. 

One  thing  to  notice  about  the  bounds  in  Theorem  4.2  is  that  they  depend  heavily 
on  the  term  T^,  which  is  a  measure  of  the  conditioning  of  the  computed  s-step  Krylov 
bases.  This  indicates  that  if  T^  is  controlled  in  some  way  to  be  near  constant,  i.e., 
ffc  =  0(1))  the  bounds  in  Theorem  4.2  will  be  on  the  same  order  as  Paige’s  analogous 
bounds  for  classical  Lanczos  [27],  and  thus  we  can  expect  orthogonality  to  be  lost  at 
a  similar  rate.  The  bounds  also  suggest  that  for  the  s-step  variant  to  have  any  use, 
we  must  have  Tk  =  o(e“^0)  Otherwise  there  can  be  no  expectation  of  orthogonality. 
Note  that  l||Hfclll2  should  be  <  IH^IIb  for  many  practical  basis  choices. 

Comparing  to  Paige’s  result,  we  can  think  of  sk  +  j  steps  of  classical  Lanczos  as 
the  case  where  s  =  1,  with  jVo  =  In,n  (and  then  Vsk+j  =  v'f.  j,  Bk  =  A).  In  this  case 
ffc  =  1  and  Tfc  =  cr  and  our  bounds  reduce  (modulo  constants)  to  those  of  Paige  [27]. 

4.1.  Proof  of  Theorem  4.2.  The  remainder  of  this  section  is  dedicated  to  the 
proof  of  Theorem  4.2.  We  hrst  proceed  toward  proving  (4.3). 

In  finite  precision,  the  Gram  matrix  construction  in  line  4  of  Algorithm  2  becomes 

Gk  =  fl{yIyk)=yIyk  +  6Gk,  where  \6Gk\<en\y^\\yk\,  (4.7) 

and  line  14  of  Algorithm  2,  becomes  $sk+j+i  =  fl{  fK'ii’'k,jGkw'k,jY^'^)-  Let 

d  =  fl{wkpGkWkj)  =  {w'kj  +  5wkj){Gk  +  6Gk,wj)w'kj 
=  w'k^jylykw'kj  +  fofcj'JGfcfofcy  +  w'k,jSGk,w,Wkj  +  Sw'i^jGkw'k^j, 

where 

\Sw'kJ  <  e{2s+2)\wkj\  and  (4.8) 

\5Gk,Wj  \  <  e(2s+2)|G'fc|.  (4-9) 

Remember  that  in  the  above  equation  we  have  ignored  all  terms.  Now,  we  let 
c  =  w'lfjSGkw'k^^  +  w'^/Gk,w,w'k^j  +  Sw'i^jGkw'kj,  where 

|c|  <  e(n+4s-|-4)rfc|| j)fcti)fcj II2. 


(4.10) 
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We  can  then  write 


d-  \\ykWk,j\\2  +  <^-  \\ykw'k^j\\^  +  c-  -  llA^fejIla  (l  + 


\ykw 


k,j  II 2 


\ykwi 


|2  ’ 


fc,i  II2 ' 


and  the  computation  of  Psk+j+i  becomes 

^sk+j+i  =  ='/d  +  5j3sk+j+i  =  1+  -  , — 17^  )+'^/3s/c+i+i,  (4.11) 

V  ^Wykw'kM) 

where 

\5l3sk+j+i\  <  ev^  =  ellAwfcjIb-  (4-12) 

The  coordinate  vector  is  computed  as 

Wfcj+i  =  /;(wfcjMfc+i+i)  =  (wfej  +  5wf,j)l(3sk+3+i,  (4.13) 

where 

\5w',,J<e\w'kj\.  (4.14) 

The  corresponding  Lanczos  vector  Vgk+j+i  (as  well  as  ttsfc+j+i)  are  recovered  by 
a  change  of  basis:  in  finite  precision,  we  have 

vsk+,+i  =  fi{yki'k,j+i)  =  {yk  +  syk,v,+,)v'k,j+i,  \syk,v,+A  <  e(2s+2)iAl,  (4.15) 

and 

Usk+j+i  =  fl{yku'k,j+i)  =  (A+<5A,«,+,)w'fc,,+i,  <  e{2s+2)\yk\-  (4.16) 

We  can  now  prove  (4.3)  in  Theorem  4.2.  Using  (4.11),  (4.13),  and  (4.15), 


'dsk+j+i'^sk+j+i  —  Vl.j_^_l{yk  +  syk,vj+i)  {yk  +  dyk,vj+i)i’k,j+i 

T  /  . 


Wk,j  + 

/3sk+j  +  l 


{ylyk  +  26yl,^^^yk) 


+  M.i 

Psk+j  +  l 


_  \\ykw'k,j\\l  +  +  2Sw'^jy^ykw'k^j 

^sfc+j+l 

_  \\ykw'k,j\\l  +  +  26w'^^^yl%w'k^j 

\\ykw't,  j\\l  +  (c  +  2||j)fcw;_j2  •  5(3sk+j+i) 

llAwfejIll  \\%w'^J\l{c  +  2\\ykw'^^J\2-5Psk+j+i) 


n'  l|4 


Awfcj  +  5w'i[jylyuw'f,j) 


=  1  - 


c+  2||j)fcw(,^^-||2  ■  d/3sfc+j+i 

wykwiM 

2{wkjyi.v,^,ykw'k.n + ^^'Lyiykw'k.,) 


wykw'k. 
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Now,  using  bounds  in  (4.7),  (4.8),  (4.9),  (4.10),  (4.15),  (4.16),  and  Lemma  4.1,  we 
obtain 


\v^k+j+i^sk+j+i  ~  1|  <e(n+4s+4)r^  +  2e  +  2e(2s+2)rfc  +  2eTk 
<e(n+4s+4)r^  +  e(4s+6)rj,  +  2e 
<e(n+8s+12)r|. 

This  thus  proves  (4.3),  and  we  now  proceed  toward  proving  (4.2).  Using  (4.9),  line  12 
in  Algorithm  2  is  computed  in  finite  precision  as 

dsfc+j  =  +  Sv'iFj){Gk  +  SGk,uj)u'kj, 

where  \Sv'i^  j\  <  e{2s  +  2)\v'^  -\  and  \5Gk, Uj\  <  e(2s  +  2)|G'fc|.  Expanding  the  above 
equation  using  (4.7),  and  (4.15),  we  obtain 

^sk+j  =v'k^jGku'kj  +  v'k^j5Gk,ujU'kj  +  Sv'kjGku'kj 

=^'k,j{ykyk  +  SGk)u'kJ  +  Vkj5Gk,UjUkj  +  SV/FjGku'kJ 

+  v'^jSGku'k,j  +  +  5v'lPku'k,j 

=  {Vsk+j  ~  ^yk,VjVkj)  {usk+j  ~  ^yk,UjUkj)  +  ^kJ^^kUkj  +  '^k,j^^k,UjUkj 

+  ^'^'k^j^ku'kj 

~'^sk+j'^sk+j  +  Sask+jj  (4-17) 

with  s&sk+j  =  Sv'^jGku'kj  +  +  6Gk,u,  -  yl^yk^u,  -  Sy^ySk^k^j- 

Using  bounds  in  (4.3),  (4.7),  (4.8),  (4.9),  (4.15),  and  (4.16),  as  well  as  Lemma  4.1, 
we  can  write  (again,  ignoring  terms), 

l^asfc+jl  <e{n+8s+8)\v'^j\\yk\\yk\\u'kJ 

<e{n+8s+8)  ||  |A||^;,.,|  lb  II  |3>fclK.,|  lb 

<e(n+8s+8)(rfc||i)sfc+j|b)(rfc||{tsfe+ilb) 

<e(n+8s+8)  (rfc(l  +  (e/2)(n+8s+12)r^))  (rfeHitsfc+j  |b) 

<e(n+8s+8)rfc(rfe||'u 

sk-\-j  II 2) 

<e(n+8s+8)r|  lltisfc+j  ll^.  (4.18) 

Taking  the  norm  of  (4.17),  and  using  the  bounds  in  (4.18)  and  (4.3),  we  obtain  the 
bound 


\ask+j\  <|bffc+j|bl|ws/c+ilb  +  basfc+il 

^(1  +  (€/2)(''i-+8s+12)r|)  llu^fc+j  lb  +  e(n+8s+8)r|||{(5fe_i_j||2 

<  (1  +  e((3/2)n+12s+14)r2)  ||u,fc+,  |b.  (4.19) 

In  finite  precision,  line  13  of  Algorithm  2  is  computed  as 
Wfc.i  =  '“fcj  “  ^sk+jVkj  -  Swkj,  where  \SwkJ  <  e{\u'kj  +  2\ask+jVkJ).  (4.20) 
Multiplying  both  sides  of  (4.20)  by  gives 

ykw'kj  =  yku'kj  -  ask+^ykv'kj  -  ykSwkj, 
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and  multiplying  each  side  by  its  own  transpose,  we  get 

w'kjylykw^j  =  {ykUkj-ask+jykv'k,j-ykSwk^j)'^{yku'k,j-ask+jykVk,j-ykSwkj) 
=  '^'kjylykUkj  —  2ask+jUkjykykVk,j  +  ^^sk+j’^'kjy'kykVkJ 
-  (yku'k^j-ask+jykv'kj )  -  {yku'kj  -  ask+j%Kjf%sw'kj . 

Using  (4.15)  and  (4.16),  we  can  write 

'^kjyk  ykWkJ  —  (Usk+j  ~  Syk,UjUkj)  {Usk+j  ~  Syk,UjUkj) 

2ttsk+j{Usk+j  ^yk.UjUkj)  {^sk+j  ^yk.vj’^kJ^ 

4"  '^sk+ji'^sk+j  ~  5yk,Vj'Vkj)  (Vsk+j  ~  ^yk,VjVkj) 

—  2SWkjyk  (ykUkJ  ~  ^sk+jS^k^kj) 

~  '^sk+j'^sk+j  ~  ‘^^sk+j^yk,UjUk,j  ~  ‘^^sk+j'^sk+j'^sk+j 

+  ‘2'^sk+jUsk+j^yk,Vj'Vkj  +  26lsk+jUkjSy k,Uj^sk+j 

4~  ^sk+j^sk+j'^sk+j  2(Xgk+j^sk+j^yk,Vj^kJ 

—  2Swkjyk  (yk^kj  ~  ^sk+jyk'Vkj) 

~  '^sk+j'^sk+j  ~  26lsk+jUgk+j^8k+j  +  CX.gk+j'^sk+j'^sk+j 

2{Syk^Uj’^k,j  ^8k+j^yk^Vj^k,j)  i'^sk+j  ^sk+j'^sk+j) 

—  251111,  jS^f.  {ykUkj  ~  Ctsk+jS^k^kj)- 


This  can  be  written 

Il^'fc'^fc,jll2  “  ll'^sfc+j||2  ~  2o.sk+jUgk+j^sk+j  +  ®sfc+j  ll^lsfc+j  II2 

2{Syk^Uj'5kj  5xsk+j5yk,Vj^k,j  5~  yk5Wj,  j)  (^iisk+j  5^sk+j‘5sk+j) ^ 

where  we  have  used  yku'^.  ^  -  ctsk+jykv'k  j  =  Usk+j  -  ask+jVsk+j  +  0{e).  Now,  us¬ 
ing  (4.17), 

l|34fc'd'fcj-||2  =  l|ilsfc+j||2  ~  2digk+j{cy.sk+j  —  5ask+j)  +  ^sk+j\\'^sk+j\\2 

2{Syk^Uj'5kj  5Xsk+j5yk,Vj^k^j  4”  yk^^k^j)  {'^sk+j  5^sk+j'5sk+j) 

=  ||wsfc+j||2  4“  <3sfe+j(||'Csfe-i-j||2  ~  2)  -l-  2ask+j56'sk+j 

2{Syk^Uj'5kj  5Xsk+j5yk,Vj^k^j  4”  yk5Wj,  j)  {^iisk+j  5^sk+j'5sk+j) • 

Now,  we  rearrange  the  above  equation  to  obtain 

Il^^fe^fe,jll2  4“  ci!gj,_^j  —  ||'Usfc+j||2  =  clgfc-i-j (ll^sfc+i II2  ~  1)  4“  2agk+j56:gk+j 

2{Syk^UjUk  j  Oigk+j5yk^Vj'5k^j  4“  yk5Wf,  j)  {Ugk+j  Oigk+jVgk+j^' 


Using  Lemma  4.1  and  bounds  in  (4.3),  (4.15),  (4.16),  (4.18),  (4.19),  and  (4.20) 
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we  can  then  write 

\\ykw'kj\\2+^^sk+j~\\'^sk+j\\2  <  (^l+e((3/2)n+12s+14)r^^  •  e(n+8s+12)r^||Msfc+j||2 

+  2 ^l+e((3/2)n+12s+14)r|y  e(n+8s+8)r|||M5fc_i_j||2 
+  2e((2s+2)  +  (2s+2)  +  3)  Tfc  II  II  2  *  2||'^sfc+j||2 

<  e(nH-85+12)r^||'Usfc_i_j  III  +  2e(n+8s+8)r|||'Usfc+j||| 

+  e(16s+28)rfc||{(sfe_|_j|||, 

where,  again,  we  have  ignored  terms.  Using  Fj,  <  r|,  this  gives  the  bound 

llAic'fcJII  +  -  \\usk+,\\l  <  e(3n+40s+56)r|||h,fe+,|||.  (4.21) 

Given  the  above,  we  can  also  write  the  bound 
\\ykwl,\\l  <  \\ykw,,^j\\l  +  alk+j  <  (l  +  e(3n+40s+56)r|)||M,fc+,|||,  (4.22) 

and  using  (4.10),  (4.11),  and  (4.12), 

\Psk+j+i\  <  II -112  (  1  +  e  +  ) 

^  (l  +  (l/2)e(3n+40s+56)r|) ||usfe+j II2  [  1  +  e+  ..  —  ) 

V  Aykw'^^jWi) 

<  (1  +  e  +  (l/2)e(n+4s+4)r|  +  (l/2)e(3n+40s+56)r|) ||Msfc+j||2- 
Combining  terms  and  using  1  <  r|,  the  above  can  be  written 

|/3sfc+j+i|  < (1  +  e(2n+22s+31)r|)  ll^.  (4.23) 

Now,  rearranging  (4.13),  we  can  write 

Psk+j+iv'k,j+l  =  ''^'k,j  + 
and  premultiplying  by  yk,  we  obtain 

Psk+j+iykv'k,j+i  =  ykw'kj  +  yk5w't,y 
Using  (4.15),  this  can  be  written 

Psk+j+i{vsk+j+i  ~  =  ykWk,j  d"  yk^w^j- 

Rearranging  and  using  (4.13), 

Psk+i+lVsk+j  +  l  =  ykWkj  +  ykSWkJ  +  +  + 

=  ykw'kj  +  +  (5A.«,+i(wfcj  + 

=  ykw'kj + yk^w'k,]  +  5yk,v,+^w'k,j 

=  ykw'kj  +  Swsk+j, 


(4.24) 
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where  Swsk+j  =  ykSw').  j  +  dyk,vj+iw'i^  j-  Using  Lemma  4.1  and  bounds  in  (4.14), 
(4.15),  and  (4.22), 

\\Swsk+jh  <  elllAllwfejIlb  +  e(2s+2)|||A||wfej|||2 

<  e(2s+3)rfc||j^feu;;,_^.||2 

<  e(2s+3)rfc||'Usfc+j||2.  (4-25) 

We  premultiply  (4.24)  by  and  use  (4.15),  (4.16),  (4.17),  and  (4.20)  to  obtain 

^sk+j+lVgk+jVsk+j  +  l  =  'Vsk+jiykWi^j  +  6Wsk+j) 

=  vJk+j{yku'k,j  -  ask+jykVk,j  -  ykSwkj)  +  vJk+jSw,k+j 

=  vJk+j{yku'k,j  -  ask+jykv'kj)  -  vJk+^{ykSwkj  -  6wsk+j) 

~  '^sk+jii'^sk+j  ~  Syk,UjUi^  j)  —  ask+jivsk+j  ~  j)) 

-  vJk+jiykSw'^j  -  Sw,k+j) 

~  '^sk+j'^sk+j  ~  ^sk+ji'sk+j'^sk+j 

-  vIk+j{syk,u,Uk  j  -  ask+jSyk,v,Vkj  +  ykSw'k^j  -  swsk+j) 

—  {^sk-\-j  II 2 

~  ^sk+ji^yk,UjUf.  j  —  d'sk-\-jSyk,Vjyk,j  ykSWkj  ~  SWsk+j) 

=  SO-sf^^j  (Il'U^/^-l-j  II2  1) 

-  vJk+ji^yk,u,u'^j  -  ask+jSyk,vy'^j  +  ykSw'^^j  -  swsk+j), 

and  using  Lemma  4.1  and  bounds  in  (4.3),  (4.13),  (4.15),  (4.16),  (4.18),  (4.19),  (4.20), 
and  (4.25),  we  can  write  the  bound 

^sk+j  +  l  ■  'Vsk+j^sk+j+1  ^  I'Jdsfc+jl  +  |dsfc+j  I  I'Csfc+j'Csfe+j  ~  1| 

+  llusfc+jibdl  \^yKn,\\u'k,j\  II2  +  \ask+j\\\  \Syk,v,\\v'kJ  II2) 

+  llUsfc+jIbdl  \yk\\SWkJ  II2  +  ||(5Wsfc+dl2) 

<  2e(n+lls+15)r2||u,fc+j-||2.  (4.26) 

This  is  a  start  toward  proving  (4.2).  We  will  return  to  the  above  bound  once  we 
later  prove  a  bound  on  ||wsfc+j||2.  Our  next  step  is  to  analyze  the  error  in  each  column 
of  the  finite  precision  s-step  Lanczos  recurrence.  First,  we  note  that  we  can  write  the 
error  in  computing  the  s-step  bases  (line  3  in  Algorithm  2)  by 

Ay^  =  ykBk  +  6Ek  (4.27) 

where  =  [Vk[Is,0s,iV ,0n,i,i^k[Is,0s,iV ,0n,i]-  It  can  be  shown  (see,  e.g.,  [3])  that 
if  the  basis  is  computed  in  the  usual  way  by  repeated  SpMVs, 

\6Ek\  <  e{{3+n)\A\\y^\+4\yk\\Bk\).  (4.28) 

In  finite  precision,  line  17  in  Algorithm  2  is  computed  as 


^'k,j  —  Bkv'kj—Psk+jv'kj-i+Su'i^j,  \Su'i^j\  <e({2s+3)\Bk\\vkj\+‘2\Psk+jVk,j-i\),  (4.29) 
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and  then,  with  Lemma  4.1,  (4.15),  (4.16),  (4.27),  and  (4.29),  we  can  write 
^sk-\-j  —  (iVfc  4” 

=  (iVfc  +  5yk,Uj){BkVk,j  ~  ^sk^jVk,j-l  +  ^'^k,j) 

S^k^k^k,j  ^sk-\-jyk'^k,j—l  4”  ^k^^k,j  4“  ^yk,Uj^k^k,j  ^sk-\-j^yk,Uj^k,j  —  l 

=  i^Sk  -  ^^k)vkj  -  Psk-\-j{vsk+j-i  -  syk,vj.iVkj-i)  +  ykSukj 

4“  ^yk,Uj^k'^kJ  ^sk-\-j^yk,Uj'^kJ—l 

^^k^k,j  ^^k^k,j  ^sk-\-j'^sk-\-j  —  l  4“  ^sk-\-j^yk,Vj  —  i'^k,j—l  4“  ^k^^k,j 

4“  ^yk,Uj^k'^kJ  ^sk-\-j^yk,Uj'^kJ—l 

=  A(^Vsk-\-j  ^S^k,Vj'^kj')  ^^k'^kj  Psk-\-j'^sk-\-j—l  4“  Psk-\-j^yk,Vj-i'^kJ  —  l 

4“  yk^^k,j  4“  ^yk,Uj^k'^k,j  ^sk-\-j^yk,Uj'^k,j—l 
—  AVgk-\-j  ^^^k,Vj'^kJ  ^^k^k,j  Psk-\-j'^sk+j—l  4“  Psk-\-j^yk,Vj-i'^kJ  —  l 

4“  yk^^k,j  4“  ^yk,Uj^k'^k,j  ^sk-\-j^yk,Uj'^k,j—l 
=  AVsk-\-j  ^sk-\-j'^sk-\-j—l  4“  SUsk-\-j')  (4.30) 

where 

^'^sk+j  —  yk^'^kj  {ASyk,Vj  ^yk,Uj^k  4“  SEk')Vk,j  4“  ^sk-\-j{^yk,Vj-i  ^^k,Uj)^kJ—l' 
Using  the  bounds  in  (4.15),  (4.16),  (4.23),  (4.28),  and  (4.29)  we  can  write 

\6usk+j\  <  e{{2s+3)\yk\\Bk\\vkJ  +  2\Psk+j\\yk\\vkj-i\) 

+  €{2s+2)\A\\yk\\vkj\  +  e(2s+2)|A||^fe||t^fcj| 

+  e{{5+n)\A\\yk\\vkj\  +  4|A||^/c||^fcj|) 

+  2e{2s+2)\Psk+j\\yk\\vk,j-i\ 

<  e{n+2s+5)\A\\yk\\v[J  +  e(4s+9)|J^fc||^fc||^fej| 

+  e(4s+6)(l  +  e(2n+22s+31)r^) ||Msfc+j_i II2  •  |A||v^j_,|. 

and  from  this  we  obtain 

\\Su.,k+jh  <  e(n+2s+5)  ||  |A|  lb  ||  |A|  lb  \\v'kj2  +  <45+9)  ||  |A|  |b  II  l^<  II2  Wi'kjh 
+  e(4s+6)  lllAlIb  \\vk,j-ih  l|Msfc+i-i|b 
<  e(n+2s+5)  ||  |A|  |b  +  e(4s+9)  |j  \Bk\  |b  Ffe  +  e(4s+6)  Httsfe+j-ilb  Ffc. 

We  will  now  introduce  and  make  use  of  the  quantities  cr  =  |||2l||b/||2l|b  and  Tk  = 
II  l^fcl  Ib/ll^lb-  Note  that  the  quantity  ||  |Sfc|  |b  is  controlled  by  the  user,  and  for  many 
popular  basis  choices,  such  as  monomial,  Newton,  or  Chebyshev  bases,  it  should  be 
the  case  that  111^8^1  |b  <  ||  |A|  |b-  Using  these  quantities,  the  bound  above  can  be 
written 

II2  <  e(^((n+2s+5)cr  +  (4s+9)Tfc)P|b  +  (4s+6)||usfe+j_i|b)Ffc.  (4.31) 
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Manipulating  (4.24),  and  using  (4.15),  (4.16),  and  (4.20),  we  have 

i^sk+j  +  l'^sk+j  +  l  —  y^k^k,j  4”  ^'^sk+j 

S^k^k^j  ^sk+jS^k^k^j  ^k^kUk  j  ~\~  ^^^sk+j 

—  iS^sk+j  ^S^k,Uj  ^kj')  ^sk+j  {"^sk+j  ^^k^vj'^k.j)  ~^^'^sk+j 

—  'Q'sk+j  ^sk+j'^sk+j  ^S^k.Uj  'lkf^j~\~(Xsk+j^^k^Vj  ^k,j  ^k^^kj 
+  SWsk+j, 

and  substituting  in  the  expression  for  iisk+j  in  (4.30)  on  the  right,  we  obtain 

Psk+j  +  l'^sk+j  +  l  =  -^'^sk+j  ^sk+j'^sk+j  i^sk+j'^sk+j  —  l  4”  ^'^sk+j:  (4.32) 

where 

^'^sk+j  —  ^'^sk+j  ^^k^UjUf^  j  4“  ^sk+j^^k^Vj'^k,j  ^k^'^kj  4“  6Wsk+j- 

From  this  we  can  write  the  componentwise  bound 

\Svsk+j\  <  \Susk+j\  +  \syk,u,\  \u'kj  +  \ask+j\  \syk,v,\  \v'kj  +  lAl  Idwfcjl  +  \Swsk+j\, 
and  using  Lemma  4.1,  (4.14),  (4.15),  (4.16),  (4.19),  (4.20),  and  (4.25)  we  obtain 

ll^nsfe+jlb  ^  ||<5w  sk-\-j  II 2  +  e(2s+2)||  |j4fe||ufcj|  II 2 

4-  (l  4-  e((3/2)n+12s+14)rfc)  ||usfc+j||2  •  e(2s+2)||  |34fc||{)(,^^-|  ||2 

4-  e||  |34fc||ufej|  II2  4-  2e(l  +  e((3/2)n+12s+14)r^) ||Msfc+j II2II  |34fc||u(,j|  II2 

+  e(2s+3)rfe||Msfc+j||2 

<  ||<5wsfc+ill2  4-  e(2s+2)rfe||Msfc+j||2  4-  e(2s+2)rfc||'Ugfc_|_j II2 
4-  eFfcll  II 2  4-  2erfe||  '^sk-\-j  II2  +  e(2s+3)rfc||  ^sk-\-j  II 2 

<  ||dusfc+il|2  4-  e(6s+10)rfc||usfc+j||2. 

Using  (4.31),  this  gives  the  bound 
WSvsk+jh  < 

e^((n+2s+5)(T+(4s+9)Tfc)  ||2l||2  +  (6s+10)||Msfc+j||2  +  (4s+6)||'Usfe+j-i||2^rfc.  (4.33) 

We  now  have  everything  we  need  to  write  the  finite-precision  s-step  Lanczos 
recurrence  in  its  familiar  matrix  form.  Let 

do  $1 

Pi 

Psk+j 

Psk+j  dsfc+J  _ 

and  let  Vsk+j  =  N,  ni,  •  •  ■ ,  n^fc-i-j]  and  6Vsk+j  =  [duo,  dui, . . . ,  dugfe-t-j]-  Note  that 
Tsk+j  has  dimension  (sfc-|-j-|-l)-by-(sA:-|-j-|-l),  and  Vsk+j  and  SVsk+j  have  dimension 
n-by-(sA:  +  j  +  1).  Then  (4.32)  in  matrix  form  gives 

^Vsk+j  —  Vsk+jTgk+j  4“  Psk+j  +  l'^sk+j  +  l^sk+j+1  ^^sk+j- 


(4.34) 
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Thus  (4.33)  gives  a  bound  on  the  error  in  the  columns  of  the  finite  precision  s-step 
Lanczos  recurrence.  Again,  we  will  return  to  (4.33)  to  prove  (4.1)  once  we  bound 

\\Usk+j\\2- 

Now,  we  examine  the  possible  loss  of  orthogonality  in  the  vectors  Do, ... ,  Vsk+j+i- 
We  define  the  strictly  upper  triangular  matrix  Rsk+j  of  dimension  (sfc  +  j  +  l)-by- 
(sfc  +  j  +  1)  with  elements  pij,  for  i,j  G  {1, . . . ,  sk  +  j  +  1} ,  such  that 

^sk+j^sk+j  =  Rgk+j  T  diag(t4fc+jb)ifc+j)  +  Rsk+j- 

For  notational  purposes,  we  also  define  Psk+j+i,sk+j+2  =  i’Jk+j'^sk+j+i-  (Note  that 
Psk+j+i,sk+j+2  is  not  an  element  in  Rgk+j,  but  would  be  an  element  in  Rgk+j+i). 
Multiplying  (4.34)  on  the  left  by  V^k+j^ 


^sk+j^^sk+j  —  ^sk+j'^sk+jTsk+j  +  Psk+j  +  lVsk+j'^sk+j+ie-sk+j  +  l  '^sk+j^^sk+j- 


Since  the  above  is  symmetric,  we  can  equate  the  right  hand  side  by  its  own  transpose 
to  obtain 


Tsk+j{Rsk+j  +  Rsk+j)  ~  {Rgk+j  +  Rsk+j)Tgk+j  = 

Psk+j  +  l{Vgk+j'^sk+j+lSsk+j+l  ~  ^sk+j  +  lVgk+j  +  l^sk+j) 

+  ^sk+j ^^sk+j  —  SVgk+j  Vsk+j  +  diag(14fc+j  Vsk+j )  •  Tsk+j  —  Tgk+j  •  diag(t4 fc+j ^sk+j )  ■ 

Now,  let  Mgk+j  =  Tgk+jRgk+j  —  Rsk+jTgk+j,  which  is  upper  triangular  and  has 
dimension  {sk  +  j  +  l)-by-(sA:  +  j  +  1).  Then  the  left-hand  side  above  can  be  written 
as  Mgk+j  —  and  we  can  equate  the  strictly  upper  triangular  part  of  Mgk+j 

with  the  strictly  upper  triangular  part  of  the  right-hand  side  above.  The  diagonal 
elements  can  be  obtained  from  the  definition  Mgk+j  =  Tgk+jRgk+j  ~  Rsk+jTgk+j,  i-O., 

^1,1  —  /^lPl,2,  '^sk+j  +  l^sk+j  +  1  —  [^sk+j  Psk+j^sk+j  +  1,  and 

m-i.*  =A-1P*-1.*  -  for  i  G  {2,. . .  ,sk  +  j}. 

Therefore,  we  can  write 

^sk+j  Tgk+jRgk+j  Rgk+jTgk+j  i^sk+j  +  l^gk+j'^sk+j  +  l^sk+j+l  ^sk+j, 

where  Hgk+j  has  elements  satisfying 
i?i.i  =  -  PlPl,2, 

r]i+ =Pi-iPi-i+ -  PiP^++i,  for  i  G  {2,...,sk+j+l}, 

^  (4.35) 

r]i-i+  =vf_2Sv^-l  -  5vf_2Vi-i  +  i3r-i{vf_2Vt-2  -  Dj^iDi-i),  and 

=vj_i6vt-i  -  6vj_iv,,-i,  for  £  e  {1, . . . ,  i  -  2}. 

To  simplify  notation,  we  introduce  the  quantities 

Usk+j  =  max  ||ui||2,  ffc  =  max  Tj,  and  fk  =  max  n. 
2G{0,...,sfc+j} 
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Using  this  notation  and  (4.3),  (4.23),  (4.26),  and  (4.33),  the  quantities  in  (4.35)  can 
be  bounded  by 

|»7i,i|  <  2e{n+l\s+\b)f\usk+j:  and,  for  i  e  {2, . . . ,  sfc+j+1}, 

WA  <  4e(n+lls+15)f^Usfc+j, 

At-iA  <  2e(^((n+2s+5)(j  +  (4s+9)ffc)||A||2  +  (n+18s+28)'Usfc+j^f^,  and 
AtA  —  2e(^((n+2s+5)(T +(4s+9)rfe)  ||  A||2  +  (10s+16)'Usfc+j^r^, 
for  ^  e  {1, . . . ,  i—2}. 

The  above  is  a  start  toward  proving  (4.6).  We  return  to  this  bound  later,  and 
now  shift  our  focus  towards  proving  a  bound  on  ||usfe+j  ||2-  To  proceed,  we  must  first 
find  a  bound  for  \v'[j^_^_jVsk+j-2\-  From  the  definition  of  Mgk+j,  we  know  the  (1,2) 
element  of  Mgk+j  is 

^0Pl,2  —  ^lPl,2  —  ^2Pl,3  =  Pi, 2, 

and  for  f  >  2,  the  (z— l,i)  element  is 

Pi-2Pi-2,i  +  {&i-2  —  ^i-l)Pi-l,i  —  APi-l.i+l  =  Vi-1, i- 

Then,  defining 

A  =  {^i-2  —  ai-i)$i-iPi-i,i  —  $i-irii-l,i 

for  z  e  {2, . . . ,  sfc+j},  we  have 

$i-lPiPi-l,i+l  =  l3i-2$i-lPi-2,i  +  +  A-1  +  •  •  •  +  ^2  ■ 

This,  along  with  (4.19),  (4.23),  (4.26),  and  (4.36)  gives 

Psk+j—lPsk+j\Psk+j—l,sk+j+l  \  —  Psk+j  —  lPsk+j  \Vgk+j  —  2'^sk+j  \ 
sk-\-j  sk-\-j 

<  X!  1^*1  -  X]  (I“*-2|  +  |a*-i|)|/3i-ip*-i.i|  + 

i—2  i—2 

sk+j 

<2e  ^  (  ^((rz+2s+5)(j  + (4s+9)Tfc)  11 AII2  +  (3n+40s+58)zZsfc-|_j^ f ^zZsfe-i-j 

i=2 

^2e(s/c+^' — 1)  ^ ^(7z+2s+5)fj  +  (4s+9)T/j;^  11^112  T  (3TZ+40s+58)zZsfc_i_j^  Y^^Ugk+j • 

(4.37) 


Rearranging  (4.30)  gives 

^sk+j  ^^sk+j  ^"^sk+j  i^sk+j^sk+j-l^ 
and  multiplying  each  side  by  its  own  transpose  (ignoring  A  terms),  we  obtain 

'^sk+j'^sk+j  ~  2Ug^^j5Usk+j  =  II^I^sfc+j||2+/3sfc+jll'I^sfc+J-l||2~2/3sfc_|_jZ)g^_l_j2lVsfe+j_l. 

(4.38) 

Rearranging  (4.32)  gives 


TlUs/c+J  — 1  —  Ask+j^sk+j  4”  O^sk+j  —  Ask+j—l  T  /^sk+j  —  l'^sk+j  —  2  — 1 
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and  premultiplying  this  expression  by  j3sk+jV^k+j^ 

Psk+jVgj-^jAVgk+j-l 

~Psk+j’^sk+j{,Psk+j‘^sk+j  “t“  ttgk-\-j—lVgk+j—l  Psk+j  —  l'^sk+j  —  2  ^"^sk+j  —  l) 

“/^sfc+j  ll'^sfc+j  II2  ^sk+j-liPsk+jVgk+ji’sk+j-l)  +  Psk+j$sk+j-l'Vgk+j'^gk+j-2 

~  Psk+jVgk+j^^sk+j-l 

+  5psk+„  (4.39) 

where,  using  bounds  in  (4.3),  (4.19),  (4.23),  (4.26),  (4.33),  and  (4.37), 

\5^sk+j\  ^  e^((n+2s+5)CT  +  (4s+9)Tfe) || A||2  +  (3n+40s+58)Msfc+j^ (4.40) 
4-2e(sA;4-j  —  I)^^(7r4-2s4-5)(j  4- (4s  4- 9)  ||^||  2  4“(3Tr4“40s4“58)'Usfc+j^  Y‘^Ugk+j  • 

Adding  2uJi.^jSugk+j  to  both  sides  of  (4.38)  and  using  (4.39),  we  obtain 

I|ilsfc+ill2  =  ll^i^sfc+j||2  +  ^sk+j{\\'^sk+j-l\\2  ~  2)  —  25Pgk+j  +  2Ug^_^j6Ugk+j 

=  ll^I'sfc+jll^  +  Plk+j{\\^sk+j-l\\2  ~  2)  +5^sk+i,  (4.41) 

where  df3gk+j  =  —2Sf3gk+j  +  2,u^i^_^_j6usk+j,  and,  using  the  bounds  in  (4.31)  and  (4.40), 

\SPgk+j\<2\6Pgk+j\+2\\uJ^.^j\\2\\Sugk+j\\2 

<  4e(sfc4-j)  ^((n4-2s4-5)(T4-(4s4-9)Tfe)  ||  A||2  4-  (3n4-40s4-58)'Usfc_|_j^  V'^Ugk+j 

4- 2e^((n4-2s4-5)CT4-(4s4-9)rfc)  II AII2  4-  (4s4-6)'Usfe+j^rfc'Usfc+j 

<  4e(sfc4-j4-l)^((n4“2s4-5)cr  4-(4s4-9)Tfc)  ||  A||2  +  (3n4-40s4-58)usfc+j^f  fctisfc+j • 

(4.42) 

Now,  using  (4.41),  and  since  >  0,  we  can  write 

||'Usfc+j||2  ^  l|ilsfc+i  Il2+/3sfc+j  ^  Il^ll2ll'^sfc+j||2+/3sfe+j(ll'l’sfe+i-l|l2~l)+l'^/3s/c+i|.  (4.43) 

Let  ^  =  max  {ugk+j,  ||  AII2}.  Then  (4.43)  along  with  bounds  in  (4.3),  (4.23),  and  (4.42) 
gives 

II Usfe-i-j  II 2  ^11  A|| 2  4-4e(sA:4-j  +2)((n4-2s4-5)o' 4-(4s4-9)Tfc 4“(37j4-40s4“58))r^/i^.  (4.44) 

We  consider  the  two  possible  cases  for  /i.  First,  if  ^  =  ||A||2,  then 

llwsfe+jlli  —  11^112  4“  4e(sfc4-j4-2) ((n4-2s4-5)cr  4-  (4s4-9)Tfc  4-  (3n4“40s4“58))f ^|| AII2 

—  11^112  ^4  4”  4e(sfc4-j4-2)  ((n4-2s4-5)(T  4-  (4s4-9)Tfc  4“  (3n.+40s4-58))f^^  . 

Otherwise,  we  have  the  case  fi  =  Ugk+j-  Since  the  bound  in  (4.44)  holds  for  all 
||{tsfe+j||2,  it  also  holds  for  =  /r^,  and  thus,  ignoring  terms  of  order  e^, 

<  11^112  4“  4e(sA:4-j4-2) ((n4-2s4-5)(T  4-  (4s4-9)Tfc  4-  (3n4-40s4-58))f^/r^ 

—  11^112  4“  4e(sA:4-j4-2) ((n4-2s4-5)(7  4-  (4s4-9)Tfc  4-  (3n4-40s4-58))f^||A||2 
—  II AII2  ^1  4“  4e(sfc4-j4-2)  ((7r4-2s4-5)o'  4- (4s4-9)Tfc  4- (3Tj4-40s4-58))f  , 
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and,  plugging  this  in  to  (4.44),  we  get 

II II 2  ^  II 4l|| 2  ^l+4e(s/c+j+2)  ((n+2s+5)(T+(4s+9)Tfc  +  (3n+40s+58))f’^^ .  (4.45) 

In  either  case  we  obtain  the  same  bound  on  ||usfe+j|||,  so  (4.45)  holds. 

Taking  the  square  root  of  (4.45),  we  have 

11115^-1-^112  ^  ll^ll 2  +  2e(sfc+7-|-2)  ((7r+2s+5)(T  +  (4s+9)Tfe  +  (3n+40s+58)) f ,  (4.46) 

and  substituting  (4.46)  into  (4.26),  (4.33),  and  (4.36)  proves  the  bounds  (4.2),  (4.1), 
and  (4.6)  in  Theorem  4.2,  respectively,  assuming  that 

2e(s/c+j-t-2) ^(7r+2s+5)fj  +  (4s+9)T/g  +  (37r-t-40sT58)^r^  1. 

The  only  remaining  inequality  to  prove  is  (4.4).  To  do  this,  we  first  multiply  both 
sides  of  (4.24)  by  their  own  transposes  to  obtain 

^Ik+j+iWvsk+j+iWl  =  l|3f/cWfcj||i  + 

Adding  -  Hftsfe-i-j ||i  to  both  sides, 

/^sfc+j  +  lll'®sfc+J  +  l||2  ^sk+j  ~  Il'^sfc+ill2  =  ^sk+j  ~  Il^sfc+ill2 

+  2SwJ,,_^_jykW,,J. 

Substituting  in  (4.41)  on  the  left  hand  side, 

/^sfc+J  +  l  Il'^sfc+J  +  l  II2  +  ^sk+j  ~  Il^'^sfc+ill2  ~  /3sfc+j(ll'^sfc+j-l||2  ~  2)  —  S^sk+j  = 

l|3ffcWfcj||2  +  ^Ik+j  ~  l|Wsfc+jll2  +  ‘^^’’^Ik+jS^kW^J, 

and  then  subtracting  from  both  sides  gives 

/^sfc+J  +  l  (ll^sfc+i  +  l  II  2  ~  1)  4”  ^sk+j  ~  II2  ~  (l|41sfc+j  — 1 II2  ~  2)  —  S^sk+j  = 

\\ykWkj\i  +  dffc+j  -  \\usk+j\\i  +  ‘^Swjk+jykw'kj  -  Psk+j+i- 

This  can  be  rearranged  to  give 

Psk+j+l  +  '^sk+j  +  Psk+j  ~  l|4l'Csfe+ill2  =  ll^’fc'Wlfcj'll2  +  ^sk+j  ~  Il'^sfe+jll2 

+  ‘^^^^k+jykw'i.j  +  ^5j,-,_j(||Dsfc-|-j_i||2  —  1) 

~  /45fe-|-j-|-l(ll'I'sfe+j  +  l|l2  ~  1)  4"  S^sk+j, 
and  finally,  using  (4.3),  (4.21),  (4.23),  (4.25),  and  (4.42)  gives  the  bound 


/3: 


a 


sk-\-j 


■  P'^sk+j 


II  ADgfc-i-j  II 2 


< 


4e(sfc+j-|-2)  ((n+2s+5)(7  +  (4s+9)Tfc  +  (3n+40s+58))r^||A||2. 


This  proves  (4.4)  and  thus  completes  the  proof  of  Theorem  4.2. 
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Fig.  5.1.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  classical  Lanczos  on  2D  Poisson  with 
n  =  256.  Upper  bounds  are  taken  from  [27]. 

5.  Numerical  Examples.  We  give  a  brief  example  to  illustrate  the  bounds 
in  (4.1),  (4.2),  (4.3),  and  (4.4).  We  run  s-step  Lanczos  (Algorithm  2)  in  double  pre¬ 
cision  with  s  =  8  on  the  same  model  problem  used  in  Section  3:  a  2D  Poisson  matrix 
with  n  —  256,  ||A||2  =  7.93,  using  a  random  starting  vector.  For  comparison,  Fig¬ 
ure  5.1  shows  the  results  for  classical  Lanczos  using  the  bounds  derived  by  Paige  [27]. 
In  the  top  left,  the  blue  curve  gives  the  measured  value  of  normality,  —  and 

the  black  curve  plots  the  upper  bound,  (n-|-4)e.  In  the  top  right,  the  blue  curve  gives 
the  measured  value  of  orthogonality,  \pi+ivf  Vi+i\,  and  the  black  curve  plots  the  upper 
bound,  2(n-|-4)||A|j2e.  In  the  bottom  left,  the  blue  curve  gives  the  measured  value  of 
the  bound  (4.1)  for  ||5'()i||2,  and  the  black  curve  plots  the  upper  bound,  e(7-|-5||  |A|  II2). 
In  the  bottom  right,  the  blue  curve  gives  the  measured  value  of  the  bound  (4.4),  and 
the  black  curve  plots  the  upper  bound,  4ie(3(n  +  4)||A||2  -I-  (7  -I-  5||  |A|  ||2))||A|j2. 

The  results  for  s-step  Lanczos  are  shown  in  Figures  5.2— 5.4.  The  same  tests  were 
run  for  three  different  basis  choices:  monomial  (Figure  5.2),  Newton  (Figure  5.3),  and 
Chebyshev  (Figure  5.4)  (see,  e.g.,  [31]).  For  each  of  the  four  plots  in  each  Figure, 
the  blue  curves  give  the  measured  values  of  the  quantities  on  the  left  hand  sides  of 
(clockwise  from  the  upper  left)  (4.3),  (4.2),  (4.1),  and  (4.4).  The  cyan  curves  give 
the  maximum  of  the  measured  values  so  far.  The  red  curves  give  the  value  of  F^  as 
defined  in  Theorem  4.2,  and  the  blacks  curves  give  the  upper  bounds  on  the  right 
hand  sides  of  (4.3),  (4.2),  (4.1),  and  (4.4). 

We  see  from  Figures  5.2— 5.4  that  the  upper  bounds  given  in  Theorem  4.2  are 
valid.  In  particular,  we  can  also  see  that  the  shape  of  the  curve  gives  a  good  indica¬ 
tion  of  the  shape  of  the  curves  for  maxi<sfc+j  — 1]  and  md.iii<sk+j  \Pi+ivJ Vi+i\. 

However,  from  Figure  5.2  for  the  monomial  basis,  we  see  that  if  the  basis  has  a  high 
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Fig.  5.2.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  s-step  Lanczos  on  2D  Poisson  with 
n  =  256  and  s  =  8  for  monomial  basis.  Bounds  obtained  using  Tf^  as  defined  in  Theorem  J^.2. 
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Fig.  5.3.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  s-step  Lanczos  on  2D  Poisson  with 
n  =  256  and  s  =  8  for  Newton  basis.  Bounds  obtained  using  Tj.  as  defined  in  Theorem  4-2. 
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Fig.  5.4.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  s-step  Lanczos  on  2D  Poisson  with 
n  =  256  and  s  =  8  for  Chebyshev  basis.  Bounds  obtained  using  Fk  as  defined  in  Theorem  4.2. 


condition  number,  as  does  the  monomial  basis  here,  the  upper  bound  can  be  a  very 
large  overestimate  quantitatively,  leading  to  bounds  that  are  not  useful. 

There  is  an  easy  way  to  improve  the  bounds  by  using  a  different  definition  of 
to  upper  bound  quantities  in  the  proof  of  Theorem  4.2.  Note  that  all  quantities  which 
we  have  bounded  by  ffc  in  Section  4  are  of  the  form  ||  |j^fc||a;|  ||2/||!Vfea;||2.  While  the  use 
of  Tfe  as  defined  in  Theorem  4.2  shows  how  the  bounds  depend  on  the  conditioning 
of  the  computed  Krylov  bases,  we  can  obtain  tighter  and  more  descriptive  bounds 
for  (4.3)  and  (4.2)  by  instead  using  the  definition 


— 


max 


||3^fca;||2 

For  the  bound  in  (4.1),  we  can  use  the  definition 

- '\\yk\\Bk\\v'i.,j\h  |||Alkiii2 


=  max ' 


max 


\Bk\h\\ykv'^J\2  Wykxh 

and  for  the  bound  in  (4.4),  we  can  use  the  definition 


}■ 


(5.1) 


(5.2) 


Tfcj  =  max 


(rfcj-i, 


\yk\\Bk\\vi 


max 


\yk\\x\ 


\Bk\\\2\\ykv',,j\2  \\ykx\\2 


-}•  (5.3) 


The  value  in  (5.3)  is  monotonically  increasing  since  the  bound  in  (4.37)  is  a  sum  of 
bounds  from  previous  iterations. 
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Fig.  5.5.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  monomial  basis.  Bounds  obtained  using 
Ffc  as  defined  in  (5.1)  for  top  plots,  (5.2)  for  bottom  left  plot,  and  (5.3)  for  bottom  right  plot. 


In  Figures  5.5— 5.7  we  plot  bounds  for  the  same  problem,  bases,  and  s-values  as 
Figures  5.2— 5.4,  but  using  the  new  definitions  of  Comparing  Figures  5.5— 5.7 
to  Figures  5.2— 5.4,  we  see  that  these  bounds  are  better  both  quantitatively,  in  that 
they  are  tighter,  and  qualitatively,  in  that  they  better  replicate  the  shape  of  the 
curves  for  the  measured  normality  and  orthogonality  values.  The  exception  is  for  the 
plots  of  bounds  in  (4.4)  (bottom  right  plots),  for  which  there  is  not  much  difference 
qualitatively.  It  is  also  clear  that  the  new  definitions  of  F^  correlate  well  with  the 
size  of  the  measured  values  (i.e.,  the  shape  of  the  blue  curve  closely  follows  the  shape 
of  the  red  curve).  Note  that,  unlike  the  definition  of  in  Theorem  4.2,  using  the 
definitions  in  (5.1)  — (5.3)  do  not  require  the  assumption  of  linear  independence  of  the 
basis  vectors. 

Although  these  new  bounds  can  not  be  computed  a  priori,  the  right  hand  sides 
of  (5.1),  (5.2),  and  (5.3)  can  be  computed  within  each  inner  loop  iteration  for  the 
cost  of  one  extra  reduction  per  outer  loop.  This  extra  cost  comes  from  the  need  to 
compute  |3^fe|^|(Vfe|,  although  this  could  potentially  be  performed  simultaneously  with 
the  computation  of  Gk  (line  4  in  Algorithm  2).  This  means  that  meaningful  bounds 
could  be  cheaply  estimated  during  the  iterations.  Designing  a  scheme  to  improve 
numerical  properties  using  this  information  remains  future  work. 

6.  Future  work.  In  this  paper,  we  have  presented  a  complete  rounding  error 
analysis  of  the  s-step  Lanczos  method.  The  derived  bounds  are  analogous  to  those  of 
Paige  for  classical  Lanczos  [27],  but  also  depend  on  a  amplification  factor  f^,  which 
depends  on  the  condition  number  of  the  Krylov  bases  computed  every  in  each  outer 
loop.  Our  analysis  confirms  the  empirical  observation  that  the  conditioning  of  the 
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Fig.  5.6.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  Newton  basis.  Bounds  obtained  using 
F/c  as  defined  in  (5.1)  for  top  plots,  (5.2)  for  bottom  left  plot,  and  (5.3)  for  bottom  right  plot. 
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Fig.  5.7.  Normality  (top  left),  orthogonality  (top  right),  recurrence  column  error  (bottom  left), 
and  difference  in  recurrence  column  size  (bottom  right)  for  Chebyshev  basis.  Bounds  obtained  using 
Tfc  as  defined  in  (5.1)  for  top  plots,  (5.2)  for  bottom  left  plot,  and  (5.3)  for  bottom  right  plot. 
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Krylov  bases  plays  a  large  role  in  determining  finite  precision  behavior. 

The  next  step  is  to  extend  the  analogous  subsequent  analyses  of  Paige,  in  which 
he  proves  properties  about  Ritz  vectors  and  Ritz  values,  relates  the  convergence  of 
a  Ritz  pair  to  loss  of  orthogonality,  and,  more  recently,  proves  a  type  of  augmented 
backward  stability  for  the  classical  Lanczos  method  [28,  29]. 

Another  area  of  interest  is  the  development  of  practical  techniques  for  improving 
s-step  Lanczos  based  on  our  results.  This  could  include  strategies  for  reorthogonal- 
izing  the  Lanczos  vectors,  (re)orthogonalizing  the  generated  Krylov  basis  vectors,  or 
controlling  the  basis  conditioning  in  a  number  of  ways.  The  bounds  could  also  be  used 
for  guiding  the  use  of  extended  precision  in  s-step  Krylov  methods;  for  example,  if  we 
want  the  bounds  in  Theorem  4.2  for  the  s-step  method  with  precision  e  to  be  similar 
to  those  for  the  classical  method  with  precision  e,  one  must  use  precision  e  «  e/ffc- 

In  this  analysis,  our  upper  bounds  are  likely  large  overestimates.  This  is  in  part 
due  to  our  replacing  T^  with  T^  in  order  to  simplify  many  of  the  bounds.  If  the 
analysis  in  this  paper  is  performed  instead  keeping  both  T^  and  T^  terms,  it  can  be 
shown  that  increasing  the  precision  in  a  few  computations  (involving  the  construction 
and  application  of  the  Gram  matrix  Gk)  can  improve  the  error  bounds  in  Theorem  4.2 
by  a  factor  of  T^.  This  motivates  the  development  of  mixed  precision  s-step  Lanczos 
methods,  which  could  potentially  trade  bandwidth  (in  extra  bits  of  precision)  for 
fewer  total  iterations.  As  demonstrated  in  Section  5,  it  is  also  possible  to  use  a  tighter, 
iteratively  updated  bound  for  T k  which  results  in  tighter  and  more  descriptive  bounds 
for  the  quantities  in  Theorem  4.2. 
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